A Method to Generate Decision Rules Automatically for Image Analysis 


In this report, we propose a method to generate rules automatically for image analysis such 
as segmentation. The method used for segmentation is best described by the following paper 
submitted to the North American Fuzzy Information Proceeding Society (NAFIPS ’92). For this 
report, slight modifications are made where only the experimental example differs from the original 
paper. ~ - - 
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ABSTRACT 

Many high-level vision systems use rule-based approaches to solve problems such as 
autonomous navigation and image understanding. The rules are usually elaborated by experts. 
However, this procedure may be rather tedious. In this paper, we propose a method to generate 
such rules automatically from training data. The proposed method is also capable of filtering out 
irrelevant features and criteria from the rules. 

1 . Introduction 

High-level computer vision involves complex tasks such as image understanding and scene 
interpretation. In domains where the models of the objects in the image can be precisely defined, 
(such as the blocks world, or even the world of generalized cylinders) existing techniques for 
description and interpretation perform quite well. However, when this is not the case (such as the 
case of outdoor scenes or extra-terrestrial environments), traditional techniques do not work well. 
For this reason, we believe that the greatest contribution of fuzzy set theory to computer vision will 
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be in the area of high-level vision. Unfortunately, very little work has been done in this highly 
promising area. Fuzzy set theoretic approaches to high-level vision have the following advantages 
over traditional techniques: i) they can easily deal with imprecise and vague properties, 
descriptions, and rules, ii) they degrade more gracefully when the input information is incomplete, 
iii) a given task can be achieved with a more compact set of rules, iv) the inferencing and the 
uncertainty (belief) maintenance can both be done in one consistent framework, v) they are 
sufficiently flexible to accommodate several types of rules other that just IF-THEN rules. Some 
examples of the types of rules that can be represented in a fuzzy framework are [1] possibility rules 
("The more X is A, the more possible that B is the range for T"). certainty rules ("The more X is 
A, the more certain Y lies in B"), gradual rules ("The more X is A, the more Y is B"), unless rules 
[2] ("if X is A, then Y is B unless Z is C"). 

The determination of properties and attributes of image regions and spatial relationships 
among regions is critical for higher level vision processes involved in tasks such as autonomous 
navigation, medical image analysis and scene interpretation. Many high-level systems have been 
designed using a rule-based approach [3,4], In these systems, common-sense knowledge about the 
world is represented in terms of rules, and the rule are then used in an inference mechanism to 
arrive at a meaningful interpretation of the contents of the image. In a rule-based system to interpret 
outdoor scenes, typical rules may be 

IF a REGION is RATHER THIN AND SOMEWHAT STRAIGHT 

THEN it is a ROAD 

IF a REGION is RATHER GREEN AND HIGHLY TEXTURED AND 
IF the REGION is BELOW a SKY REGION 

THEN it is TREES 

Attributes such as "THIN" and "NARROW", and properties such as "BRIGHT" and 
"TEXTURED" defy precise definitions, and they are best modeled by fuzzy sets. Similarly, spatial 
relationships such as "LEFT OF ", "ABOVE" and "BELOW" are difficult to model using the all- 
or-nothing traditional techniques [5]. We may interpret the attributes, properties and relationships 
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as "criteria". Therefore, we believe that a fuzzy approach to high-level vision will yield more 
realistic results. 

In most rule-based systems, the rules are usually enumerated by experts, although they 
may also be generated by a learning process. Several techniques have been suggested in the 
literature to generate rules for control problems [6-9], some of which use neural net methods to 
model the control system [7-12]. These systems convert a given set of inputs to an output by 
fuzzifying the inputs, performing fuzzy logic, and then finally defuzzifying the result of the 
inference to generate a crisp output [13]. Some of the methods also "tune" the membership 
functions that define the levels (such as "LOW", "MEDIUM" and "HIGH") of the input variables 
[10]. While these methods have been shown to be very effective in solving control problems, they 
cannot be directly used in high-level vision applications. For example, in control systems, the 
fuzzy rules have consequents which are usually a desired level of a control signal whereas in high- 
level vision, the consequent clauses are usually fuzzy labels . Also, it is desirable that membership 
functions for levels of fuzzy attributes such as "THIN”, and "NARROW", and properties such as 
"BRIGHT" be related to how humans perceive such attributes or properties. Hence they have very 
little to do with the decision making or reasoning process in which they are employed. In many 
reasoning systems for high-level vision, confidence (or importance) factors are associated with 
every rule since the confidence in the labeling may depend on the confidence of the rule itself. In 
this paper, we propose a new method to generate rules for high-level vision applications 
automatically. The rules so obtained may be combined with the rules given by the experts to 
complete the rule base. 

In Section 2, we describe several fuzzy aggregation operators which can be used in 
hierarchical (multi-layer) aggregation networks for multi-criteria decision making. In Section 3, we 
describe how these aggregation networks can be used to filter out irrelevant attributes, properties, 
and relationships and at the same time generate a compact set of fuzzy rules (with associated 
confidence factors) that describes the decision making process. In Section 4 we present some 
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experimental results on automatic rule generation. Finally Section 5 contains the summary and 
conclusions. 


2. Fuzzy Aggregation Operators 

Fuzzy set theory provides a host of very attractive aggregation connectives for integrating 
membership values representing uncertain and subjective information [14]. These connectives can 
be categorized into the following three classes based on their aggregation behavior i) union 
connectives, ii) intersection connectives, and iii) compensative connectives. Union connectives 
produce a high output whenever any one of the input values representing different features or 
criteria is high. Intersection connectives produce a high output only when all of the inputs have 
high values. Compensative connectives are used when one might be willing to sacrifice a little on 
one factor, provided the loss is compensated by gain in another factor. Compensative connectives 
can be further classified into mean operators and hybrid operators. Mean operators are monotonic 
operators that satisfy the condition: min(a,£>) < mean(u,Z>) < ma x(a,b). The generalized mean 
operator [15] as given below is one of such operator. 


vl Ip 


gU 1 ,...,.r,;p,w„...,wJ= Y, w i x i ’ where = L 


(1) 


. /=i 


i=l 


The h-j's can be thought of as the relative importance factors for the different criteria. The 
generalized mean has several attractive properties. For example, the mean value always increases 
with an increase in p [ 15]. Thus, by varying the value of p between — and -H*>, we can obtain all 
values between min and max. Therefore, in the extreme cases, this operator can be used as union 
or intersection. The y model devised by Zimmermann and Zysno [16] is an example of hybrid 

operators, and it is defined by 


y = 


-v 
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ru* '-no-*/ , where ^<5- = n and 0 < y < 1. 

V i=\ ) V 1=1 ) *=1 


( 2 ) 


V «=1 J *=1 

In general, hybrid operators are defined as the weighted arithmetic or geometric mean of a pair of 
fuzzy union and intersection operators as follows. 
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(3) 

(4) 


A © y B = (1- y> (A n B) + /(A u B) 

A ® y B = (A n B)0- tf(A u B)7 
The parameter yin (3) and (4) controls the degree of compensation. The /-model in (2) is a hybrid 
operator of the type in (4). The compensative connectives are very powerful and flexible in that by 
choosing correct parameters, one can not only control the nature (e. g. conjunctive, disjunctive and 
compensative), but also the attitude (e. g. pessimistic and optimistic) of the aggregation. 

One can formulate the problem of multicriteria decision making as follows. The support for 
a decision may depend on supports for (or degrees of satisfaction of) several different criteria, and 
the degree of satisfaction of each criterion may in turn depend on degrees of satisfaction of other 
sub-criteria, and so on. Thus, the decision process can be viewed as a hierarchical network, where 
each node in the network "aggregates" the degree of satisfaction of a particular criterion from the 
observed support. The inputs to each node are the degrees of satisfaction of each of the sub- 
criteria, and the output is the aggregated degree of satisfaction of the criterion. Thus, the decision 
making problem reduces to i) selecting robust and useful criteria for the problem on hand, ii) 
finding ways to generate memberships (degrees of satisfaction of criteria) based on values of 
features (criteria) selected, and iii) determining the structure of the network and the nature of the 
connectives at each node of the network. This includes discarding irrelevant criteria to make the 
network simple and robust. 

In our previous research, we have investigated the properties of several union and 
intersection operators, the generalized mean, and the /-model [14,17]. We have shown that 
optimization procedures based on gradient descent and random search can be used to determine the 
proper type of aggregation connective and parameters at each node, given only an approximate 
structure of the network and given a set of training data that represent the inputs at the bottom-most 

level and the desired outputs at the top-most level [14,17]. In this paper, we extend this idea to the 

\ 

detection of irrelevant attributes and automatic rule generation. 


5 



3. Redundancy Analysis and Rule Generation 

In the approach we propose, we first fuzzily partition the range of values that each criterion 
(property or an attribute or a relation) can take into several linguistic intervals such as LOW, 
MEDIUM and HIGH. The set of properties or an attributes or a relations which are used are the 
ones that may appear in the antecedent clause of a rule. As explained in Section 1, the membership 
function for each level needs to be determined according to how humans perceive such attributes, 
properties or relations. The membership values for an observed attribute, property or relationship 
value in each of the levels is calculated using such membership functions. (Methods to generate 
degrees of satisfaction of relationships such as "LEFT OF” may be found in [18]). The 
memberships are then aggregated in a fuzzy aggregation network of the type shown in Figure 1. 
The top nodes of the network represent the labels that may appear in the consequents of the rules. 
A suitable structure for the network, and suitable fuzzy aggregation operators for each node are 
chosen. The network is then trained with typical attribute, property or relationship data with the 
corresponding desired output values for the various labels to learn the aggregation connectives and 
connections that would best describe in input-output relationships. The learning may be 
implemented using a gradient descent approach similar to the backpropagation algorithm [ 14,17]. It 
is to be noted that there is a constraint on the weights. 



L SL M SH H L SL M SH H 

Feature 1 Feature N 

Figure 1 : Network for generating fuzzy rules. 
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Our experiments indicate that the choice of the network is not very critical. Also any 
compensative aggregation operator seems to yield good results. In all the results shown in this 
paper, we used the generalized mean operator as the aggregation operator. As indicated in Section 
2, the generalized mean can closely approximate a union (intersection) operator for a large positive 
(negative) value of p. We start the training with the generalized mean aggregation function with 
p=l. If the training data is better described by a union (intersection) operator, then the value of p 
will keep increasing (decreasing) as the training proceeds, until the training is terminated when the 
error becomes acceptable. Also, the weights w; in (1) may be interpreted as the relative importance 
factors for the different criteria. Initially we start the training with all the weights associated with a 
node being equal. As the training proceeds the weights automatically adjust so that the overall error 
decreases. Some of the weights eventually become very small. Thus, the training procedure has the 
ability to detect certain types of redundancies in the network. In general, there are three types of 
redundancies (irrelevant criteria) that are encountered in decision making [17]. These correspond to 
uninformative, unreliable and superfluous criteria. 

Uninformative Criteria: These are criteria whose degrees of satisfaction are always approximately 
the same, regardless of the situation. Therefore, these criteria do not provide any information about 
the situation, thus contributing little to the decision-making process. For example, low texture 
content is a criterion that is always satisfied for both clear skies and roads, and hence it would be a 
uninformative criterion if one needs to distinguish between these two labels. Uninformative criteria 
do not contribute to the robustness of the decision making process, and therefore it is desirable that 
they be eliminated. 

Unreliable Criteria: These correspond to criteria whose degrees of satisfaction do not affect the 
final decision. In other words, the final decision is the same for a wide range of degrees of 
satisfaction. For example, color would be an unreliable criterion for distinguishing a rose from a 
hibiscus because they both come in similar colors. Unreliable criteria do not contribute to the 
robustness of the decision making process, and therefore it is desirable that they be eliminated. 
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Superfluous Criteria: These are criteria which are strictly speaking not required to make the 
decision. The decisions made without considering such criteria may be as accurate or as reliable. 
For example, one may want to differentiate planar surfaces from spherical surfaces using Gaussian 
and mean curvatures, but the criteria are superfluous because either one of them is sufficient to 
distinguish between planar and spherical surfaces. However, redundancies of this type are not 
entirely without utility, since such redundancies make the decision making process more robust. If 
one criterion fails for some reason, we may still be able to arrive at the correct decision using the 
other. Hence such redundancies may be desirable to increase the robustness of the decision-making 
process. 

Redundancy Detection and Estimation of Confidence Factors: A connection is considered 
redundant if the weight associated with it gradually approaches to zero (or a small threshold value) 
as the learning proceeds. A node (associated with a criterion) is considered redundant if all the 
connections from the output of this node to other nodes become redundant. Our simulations show 
that both in the case of uninformative criteria and unreliable criteria, the weights corresponding to 
all the output connections go to zero. Therefore such nodes (criteria) are eliminated from the 
structure. The examples in Section 4 illustrate this idea. 

Rule Generation: The networks that finally result from this training process can be said to represent 
rules that may be used to make the decisions. If the final value of the parameter p at a given node is 
greater than one, the nature of the connective is disjunctive. If the value is less than one, it is 
conjunctive. Once the nature of the connective at each node is determined, we can easily construct 
the fuzzy rules that describe the input-output relations. In Section 4 we present some examples of 
this approach. 

4. Experimental results 

In this section, we present some typical experimental results involving real data to show the 
effectiveness of the proposed automatic rule generation method. The method is shown to generate 
decision rules that best describe the decision criteria for the classes in the experiment. Figure 1 


shows the general 3 layer neural network used to generate the niles. The input layer consists of nN 
number of input nodes where N is the number of fuzzy features or criteria (such as properties and 
relationships) and n is the number of linguistic levels used to partition each feature. For the hidden 
layer, there are nN hidden nodes where each node is connected to all but one (i.e., it is connected 
to n-1) input nodes representing levels within each feature. The top layer fully connects the hidden 
layer. In the experimental results shown here, we used 5 fuzzy linguistic levels to represent each 
feature, therefore, each hidden node has 4 connections. Other types of network structures were 
also tried, however the one described above produced the best results. The target values in the 
training data were chosen to be 1.0 for the class from which the training data was extracted, and 
0.0 for remaining classes. The feature values were always normalized so that they fall in the range 
[0,1]. Figure 2 depicts the trapezoidal fuzzy sets used to model the intuitive notions of the five 
linguistic levels LOW (L), SOMEWHAT LOW (SL), MEDIUM (M), SOMEWHAT 
HIGH (SH), and HIGH 


L SL M SH H 



Figure 2 : Graphical representations of various fuzzy sets. 
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4.1 Example 

Figure 3(a) shows a 200x200 image used for training in order to obtain rules that best 
describes the object (shuttle) and background. After examining a variety of possible features to be 
used, the two best features chosen were the difference entropy and contrast features. For 
definitions of the features, see report on membership generation methods. Figures 3(b) and 3(c) 
show images using these features. Figure 3(d) shows the scatter plot of the training samples 
extracted from two different regions (shuttle and background) in the image. We used 50 samples 
from each class. The membership values in each linguistic level for each sample is computed using 
the membership functions shown in Figure 2, and these with the corresponding desired targets are 
used as ttaining data in the training algorithm described in Section 3. Figure 4 shows the reduced 
network after training. All the connections with weights below a value of 0.01 were considered 
redundant. Table 1 shows the final weights (which determine the confidence factors of the rules 
and criteria) and the p parameter values (which determine the conjunctive or disjunctive nature of 
the connective) for the specified nodes in Figure 4. Using the properties for the p values obtained, 
the following rules are generated, as discussed in Section 3. 

Class Shuttle = (Difference Entropy MvDifference Entropy SHvDifference Entropy H) v 

(Contrast SL). (5) 

In other words, the rule may be summarized as 

Rshuttle : IF Difference Entropy is M or SH or H or Contrast is SL 
THEN the class is Shuttle. 

Similarly, 

Class Background = (Difference Entropy SLvDifference Entropy SH) a 

(Contrast L) (6) 

and 

^Background : IF Difference Entropy is SL or SH and Contrast is L 
THEN the class is Background. 
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These rules makes sense since by expanding (5) and (6), the expansions results in the appropriate 
cell locations where the training samples are located in Figure 3(d). 



(a) 


(b) 



Figure 3(a) : image for training, (b) : difference entropy image, and (c) : contrast image. 
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x Shuttle 
° Background 


Figure 3(d) : Scatter plot of training samples for the classes shuttle and background. 



Figure 4 : Reduced network after training. 










Table 1 : Values of weights and parameter p for the reduced network. 


- 

weights 

P 

node 1 

0.70 

5.48 


0.15 



0.15 


node 2 

0.94 

-0.21 


0.06 


node 3 

0.49 

7.04 


0.01 



0.50 


node 4 

0.94 

4.00 


0.06 


node 5 

1.0 

0.78 

node 6 

1.0 

1.88 

node 7 

1.0 

1.88 


4.2 Segmentation 

Figure 5(a) shows a 200x200 test image for segmentation using the reduced network after 
training shown in Figure 4. Figures 5(b) and 5(c) show images of the two features (difference 
entropy and contrast) that were chosen previously. After employing the shrink and expand 
algorithm to remove noise points, the resulting segmented image is shown in Figure 5(d) . 

5. Summary and Conclusions 

In this paper, we introduced a new method for automatically generating rules for high level 
vision. The range of each feature is fuzzily partitioned into several linguistic intervals such as 
LOW, MEDIUM and HIGH. The membership function for each level is determined, and the 
membership values for an observed feature value in each of the linguistic levels is calculated using 
these membership functions. The memberships are then aggregated in a fuzzy aggregation 
network. The networks are trained with typical data to leam the aggregation connectives and 
connections that would give rise to the desired decisions. The learning process can also be made to 
discard redundant features. The networks that finally result from this training process can be said 
to represent rules that may be used to make the decisions. Riseman et al used similar rules for 
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segmentation and labeling of outdoor scenes, but the weights used in the aggregation scheme were 
determined empirically [191. The ability to generate rules that can be used in fuzzy logic and rule- 
based systems directly from training data is a novel aspect of our approach. One of the issues that 
requires investigation is the choice of the number of linguistic levels and its effect on the decision 
making process. 



<C) 


(d) 


Figure 5(a) : image for testing, (b) : difference entropy image, 
(c) : contrast image, and (d) : segmented image 
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